1 Introduction
Presenting analyzed small multiples (
e.g., patches of medical images, miniature visualizations of a large genomic sequence) using latent vectors learned by machine learning (ML) models has become a common practice in many visual analytics systems [
6,
10]. A latent vector, usually represented as multi-dimensional quantitative values, is a compact representation of the analyzed data to capture relevant information. For example, a 64 × 64 pixel image can be represented as a 10-dimensional latent vector. Compared with analysis using raw data or human-crafted metrics, latent vectors enable users to organize and explore a large amount of data and conduct analysis tasks, such as finding similar items and identifying outliers, more efficiently.
Even though latent vectors can accurately capture patterns extracted from the analyzed data, they cannot be directly interpreted by humans like the original images or texts. For this reason, latent vectors are usually used to represent the similarity between data items, assuming the latent vectors of two similar items are close in a latent space. For example, dimension reduction methods (
e.g., t-SNE [
58], UMAP [
41]) are widely used to visualize latent vectors in 2D space, showing the similarities and differences among data items. Other prior studies proposed to hierarchically cluster items based on their latent vectors to conduct pattern-driven visual analytics [
6]. However, definitions of “similar items” vary depending on analysis tasks, and there is no single definition that can be applied to all scenarios. Even though some prior studies have incorporated user input to learn user’s perception of similarity [
10,
32] and even to extract human-readable concepts (
e.g., gender from face images) [
37,
68], visual analytics based on latent vectors still suffers from their limited interpretability.
Disentangled representation learning (DRL) [
12,
22] is a promising approach that can provide more explainable latent vectors through unsupervised learning,
i.e., without human labels. By disentangling features and encoding them as separated dimensions in the latent vectors, DRL can generate latent vectors whose values carry semantics and can reveal human-understandable concepts,
e.g., the value on one dimension indicates whether a person is smiling or not (Figure
1d). We call such dimensions semantic dimensions. Some recent visualization tools [
18,
20,
59] have successfully employed DRL in their analysis and demonstrated the effectiveness of DRL. For example, Gou
et al. [
18] used DRL for traffic light images to summarize images based on human-readable concepts, such as color, brightness, and rotation. These studies usually assumed that the learned semantic dimensions can perfectly capture human concepts, and the concepts can be accurately represented by a set of synthesized images. However, these assumptions do not always hold.
Potential mismatches can exist between the semantic latent dimensions learned by ML models and the concepts of humans. As shown in Figure
2a, one latent dimension correlates to the angle of human head according to the synthesized images. But when using this dimension to organize images, our experiment results show that the model confuses “angle of the head” with “whether part of the face is covered”,
e.g., covered by a dark shadow or a flower (Figure
2b). Meanwhile, previous studies focus on using DRL to diagnose supervised ML models rather than building an understanding of the data [
18,
20]. They provide limited discussion about user needs in understanding and utilizing DRL for concept-driven data exploration.
This study aims to provide a more interpretable and flexible visual exploration of small multiples by better aligning concepts of human users with the semantic latent vectors generated by ML models. We propose Drava, an interactive system that utilizes
Disentangled
Representation learning as
A Visual
Analytics approach for concept-driven data exploration. In Drava, a dataset is represented as a set of small multiples [
57],
i.e., a series of basic charts or graphics that show instances or different slices of the dataset (Figure
1). Hereafter, we call each small multiple as a
data item. For each data item, DRL learns a multi-dimensional latent vector, certain dimensions of which have semantic meanings. Drava supports an interpretable exploration of these items by supporting users in correlating and aligning the semantic dimensions with human concepts. The interactive visualizations and algorithms in Drava are motivated and guided by a three-step workflow that we propose. Throughout this workflow, users 1) understand ML-learned semantic dimensions and identify their potential mismatches with human concepts, 2) refine and align ML semantic dimensions with human concepts, and 3) generate new knowledge about the analyzed data through concept-driven exploration. Particularly, Drava automatically ranks latent vectors and proposes a concept adaptor that can refine a concept based on human input. Meanwhile, a set of interactions based on visual piles [
33] are provided, enabling users to effectively arrange, summarize, and compare items based on human-readable concepts. We demonstrate the usefulness of Drava through experimental validation and four usage scenarios. Drava is available at
https://qianwen.info/DRAVA/.
3 Related Work
First, since Drava aims to assist data exploration using explainable latent vectors, it is closely related to visual analytics on latent vectors and, more broadly, visual analytics for ML models whose hidden layers generate latent vectors of the input data.
Many visual analytics tools have been proposed to support interactive explorations of latent vectors. Dimensionality reduction techniques, such as t-SNE [
58], UMAP [
41], PCA [
1], and their variants [
35,
66], are widely used to assist the visualization of latent vectors. Most of them focus on analyzing the latent vectors generated by a specific model [
67], such as a convolutional neural network [
25,
34,
46], a graph neural network [
24], and a recurrent neural network [
35,
42,
54]. Other studies aim to provide more generic methods for visually exploring the latent space
[7, 37, 51]. Most relevant to our study is LSC [
37], which provides comprehensive support for mapping and comparing semantic dimensions in the analysis of latent vectors. However, LSC requires users to manually identify semantic dimensions, either by importing data labels or by interactively grouping items.
Apart from showing latent vectors, previous studies have combined interactive visual analytics with interactive or explainable ML to introduce interpretability into the analysis of latent vectors [
18,
23,
68]. Several studies [
18,
20,
59] used DRL to extract semantic dimensions and associate model performance with human concepts (
e.g., brightness of images, location of objects). The semantic dimensions learned by DRL are directly used without refinement, mostly because they are low-level concepts that can be easily extracted by ML. Jia
et al. [
23] proposed a visual explainable active learning approach that asks users questions and uses their answers to learn explainable attributes that can be used to classify images from unseen classes. Zhao
et al. [
68] proposed a visualization tool where users can explore and label image patches with a certain concept. These labels are used to train a concept extractor network, enabling users to diagnose model predictions using the learned concept.
However, these studies mainly focus on understanding the working mechanism of ML models and improving model performances (i.e., VIS for ML). How to utilize explainable latent vectors for concept-driven data exploration (i.e., XAI for VIS) has not been extensively discussed. Drava is built upon previous visual analytics studies on latent vectors and ML models. Unlike previous studies, Drava focuses on aligning interpretable latent vectors with human concepts to assist concept-driven data exploration.
Second, Drava learns the visual representation and supports
the exploration of small multiples [
57], a series of miniature visualizations that represent different facets, subsets, or instances of a dataset. Current studies in data visual exploration usually present small multiples as
points (e.g., [7, 16, 47, 51]), glyphs (e.g., [29, 63]), or images (e.g., [18, 27, 37]) and place them in a grid, a dimension reduction projection, or a data-driven layout. For example, Sharkzor [27] enabled users to interactively organize images and their groups while providing visual cues for groups (e.g., badges). AxiSketcher [29] uses glyph representations and offers sketch-based interactions to flexibly arrange data items in the 2D space. Even though these studies provided valuable insights, they provide limited support in inspecting and summarizing a group of small multiples, which are important to reveal and remove the mismatches between human concepts and ML semantic dimensions. Some interaction techniques have been proposed to better organize small multiples and facilitate the exploration, such as interactive piling [
4,
30,
33] and hierarchical clustering [
6,
31]. For example, interactive piling is inspired by physical piles and enables users to effectively group, aggregate, browse, and compare small multiples. However, these interactions are usually designed for specific application scenarios and cannot be directly applied to concept-driven exploration. In Drava, we adapt interactive piling to facilitate the concept-driven exploration of small multiples, especially focusing on the interpretation of semantic dimensions, the mismatch identification between ML semantic dimensions and human concepts, and guidance on refining semantic dimensions.
Third, to better guide user exploration and insight generation, researchers have proposed
interactive ML for visual data exploration, which learns what visual concepts are important to users from user feedback [
5,
10,
15,
32,
61]. For example, Behrisch
et al. [
5] trained a classifier to interactively capture users’ notion of interestingness when exploring many scatter plots. This classifier is then used to recommend potentially interesting plots and guide the exploration of large multidimensional data. Cai
et al.[
10] provides an interactive tool that empowers users to refine an ML model by communicating what types of similarities are most important when searching certain medical images. Peax [
32] proposes an efficient and accurate query of a certain visual pattern in sequential data by learning from users’ binary feedback on samples selected through active learning strategy. However, prior studies mainly use interactive ML to assist with similarity queries,
i.e., modeling the similarity between items and user-selected targets. Despite the helpful guidance that these studies provide in data exploration, they cannot provide a comprehensive overview of the analyzed data.
Like these approaches, Drava employs learning from user input to provide more precise exploration guidance. Furthermore, Drava provides semantic dimensions and supports summarization, exploration, and analysis based on different visual concepts.
4 Workflow and Tasks
In this section, we decompose the overall goal of
concept-driven visual exploration using DRL into three main steps (Figure
3). We discuss the user tasks within each step from two aspects: the characteristics of DRL, as discussed in the DRL literature [
8,
22,
28]; and the user needs in visual data exploration, largely informed by the task summarization work in previous studies [
16,
33,
37].
These user tasks have been well established in previous studies and can be reused to effectively guide the design of Drava. Moreover, reuses in the task analysis can increase the design quality and reduce expenditure, as recommended in [44, 55, 56].Step 1: Interpret ML Semantic Dimensions. Since only a subset of the latent dimensions correlates with semantic meanings, users should be assisted to identify the semantic dimensions efficiently (T1.1). For a specific dimension, users can interpret its semantic meaning (T1.2) through 1) synthesized images generated by single value traversal of this dimension or 2) data items sorted and grouped by their value in this dimension. A group summary can help users to efficiently understand the semantic meaning of a large number of items, associate it with a human concept, and identify mismatches. Unlike previous studies that group items based on their overall similarities, concept-based analysis requires to group and summarize items based on certain concepts. Therefore, proper aggregations should be provided to highlight the concept of interest and fade out others (T1.3) when summarizing an item group.
Step 2: Align ML Semantic Dimensions with Human Concepts. Once a mismatch is identified, users modify the semantic dimension to better align it with the human’s definition of concepts. Such refinement should be user-friendly and conducted upon objects that users are familiar with (T2.1), e.g., data items and item groups rather than numerical values of latent dimensions. Meanwhile, visual cues should be provided to guide and facilitate the user refinement (T2.2), e.g., highlight the items that are grouped wrongly due to a concept mismatch.
Step 3: Generate New Human Knowledge about the Data. Users explore the data items based on the identified concepts (T3.1) to generate insights about the analyzed items, including the distribution of items on one or multiple visual concepts, the association between different concepts. Such analysis can be further enhanced by correlating the concepts with other item metadata (T3.2), such as the spatial information and the item labels.
The three steps are interconnected (
i.e., arrows in Figure
3). For example, users may directly go to Step 3 from Step 1 if they do not observe obvious mismatches. Users can also go back from Step 3 to Step 2 if they find some semantic dimensions fail to support their analysis tasks and require further refinement. Drava provides a set of dedicated interactive visualizations and algorithms that are closely coupled with this three-step workflow.
7 Experimental Validation
In this section, we evaluated the back-end model in Drava from three aspects: 1) the
representativeness of the latent vector, 2) the
semantic meaning of individual latent dimensions, and 3) the improvements from
concept fine-tuning. Previous studies either focused on assessing the disentanglement of latent dimensions [
8,
22,
28] or overlooked the possible mismatches between human concepts and semantic dimensions [
18,
20,
59]. Therefore, it is important to validate the quality of these semantic latent vectors and their fine-tuning mechanism.
Representativeness of the Latent Vector. We used the reconstruction quality to show whether the latent vectors can capture all the important visual features of the input data. Figure
9 exemplifies the reconstruction quality of the latent vectors for the four datasets used in the application scenarios (section
8). Instead of the absolute similarity or the realism of the reconstructed images, we focused on evaluating whether the reconstructed images are able to capture important concepts. For the relatively simple
dsprites shapes dataset (b), the model is able to generate images that are very similar to the input data. For more complex datasets (a, c-d), even though some details in the input data are missing, the model can reconstruct salient concepts.
Semantic Meaning of Individual Latent Dimensions. To evaluate whether a single latent dimension can sufficiently depict a concept, we classified items based on their values on a certain semantic dimension and reported the classification accuracy. Specifically, for
n classes belonging to a concept,
n − 1 thresholds are learned to classify items. For example, the “smiling” concept has two classes, smiling and not smiling. We first identified a latent dimension
Di that is related to the “smiling” concept. We then classified each item based on whether its value on this dimension
di is larger or smaller than a threshold
thr, which was chosen to maximize the classification accuracy of all items. We used the
dsprites and the
CelebA datasets because they have labels for a diverse set of concepts. The results in Table
1 demonstrated that the latent dimension value could effectively represent the corresponding concept but also showed space for further improvement.
Improvements from Concept Fine-tuning. We evaluated the fine-tuning mechanism of the concept adaptor by comparing the classification accuracy of a specific concept before and after user refinement. This evaluation used the “scale” concept from the
dsprites dataset and the “smiling” and “bangs” concepts from the
CelebA dataset, because they have relatively low accuracy without any human refinement (Table
1). We chose an active learning method as the baseline for evaluating the concept adaptor. The baseline had the same architecture as the concept adaptor.
We used simulated user feedback to obtain reproducible results in a variety of settings. Following the common practices in evaluating interactive machine learning [13] and active learning [49], we simulated user feedback as an oracle (i.e., always providing correct labels to the queried items). Both the concept adaptor and the baseline used the same simulation at each iteration but with different initialization. The active learning baseline is initialized with 5% labels. The concept adaptor is initialized with no labels but the same item groups as that in Table 1. Such an initialization simulates how users would divide items into several groups for a specific concept based on their latent dimension values. At each iteration,
N items were refined (for the concept adaptor) or labeled (for the baseline) and models were trained until the validation loss did not decrease, which typically took around 10-20 epochs and less than 20 seconds. We experimented with three metrics for selecting the
N items: uncertainty scores of the classification, the standard deviation of the latent dimension value, and differences between the latent dimension value and the classification threshold. We found that refining items with the highest uncertainty score led to the best model performances. Even though we used an oracle to simulate user refinement here, real-world users can easily examine and label these items in Drava by selecting a metric of interest as the
y axis in
Item Browser.
We ran experiments under three settings:
,
, and
of the items. A total of 15 iterations were performed for each experiment. The results in Figure
10 were obtained by averaging the results of three experiments. First, the increased accuracy indicated that the concept adaptor helped align a concept and a semantic latent dimension. Compared with the baseline, the concept adaptor generated more accurate concepts by leveraging the values of the semantic dimension. Second, the curves of the concept adaptor were more smooth than the baseline, indicating a more stable improvement over iterations.
Third, while the concept adaptor and the baseline required the same amount of user effort at each iteration (i.e., the same N and the same user simulation), the concept adaptor required less user effort at the initialization than the baseline (i.e., drawing two or three lasso selections vs. labeling 5% of the items one by one). Fourth, it was not surprising that the difference between the concept adaptor and the baseline model decreased with the increase of
N and iteration steps. The advantages of the concept adaptor mainly result from using the semantic dimension values. As more and more items are labeled, these semantic dimensions become less useful in describing a concept.